102 research outputs found

    COVID-19 Lung Segmentation

    Get PDF
    The COVID-19 Lung Segmentation project provides a novel, unsupervised and fully auto- mated pipeline for the semantic segmentation of ground-glass opacity (GGO) areas in chest Computer Tomography (CT) scans of patients affected by COVID-19. In the project we provide a series of scripts and functions for the automated segmentation of lungs 3D areas, segmentation of GGO areas, and estimation of radiomic features

    A network approach for low dimensional signatures from high throughput data

    Get PDF
    : One of the main objectives of high-throughput genomics studies is to obtain a low-dimensional set of observables-a signature-for sample classification purposes (diagnosis, prognosis, stratification). Biological data, such as gene or protein expression, are commonly characterized by an up/down regulation behavior, for which discriminant-based methods could perform with high accuracy and easy interpretability. To obtain the most out of these methods features selection is even more critical, but it is known to be a NP-hard problem, and thus most feature selection approaches focuses on one feature at the time (k-best, Sequential Feature Selection, recursive feature elimination). We propose DNetPRO, Discriminant Analysis with Network PROcessing, a supervised network-based signature identification method. This method implements a network-based heuristic to generate one or more signatures out of the best performing feature pairs. The algorithm is easily scalable, allowing efficient computing for high number of observables ([Formula: see text]-[Formula: see text]). We show applications on real high-throughput genomic datasets in which our method outperforms existing results, or is compatible with them but with a smaller number of selected features. Moreover, the geometrical simplicity of the resulting class-separation surfaces allows a clearer interpretation of the obtained signatures in comparison to nonlinear classification models

    SGSI project at CNAF

    Get PDF
    The Italian Tier1 center is mainly focused on LHC and physics experiments in general. Recently we tried to widen our area of activity and established a collaboration with the University of Bologna to set-up an area inside our computing center for hosting experiments with high demands of security and privacy requirements on stored data. The first experiment we are going to host is Harmony, a project part of IMI's Big Data for Better Outcomes programme (IMI stands for Innovative Medicines Initiative). In order to be able to accept this kind of data we had to make a subset of our computing center compliant with the ISO 27001 regulation. In this article we will describe the SGSI project (Sistema Gestione Sicurezza Informazioni, Information Security Management System) with details of all the processes we have been through in order to become ISO 27001 compliant, with a particular focus on the separation of the project dedicated resources from all the others hosted in the center. We will also describe the software solutions adopted to allow this project to accept in the future any experiment or collaboration in need for this kind of security procedures

    Intraspecies characterization of bacteria via evolutionary modeling of protein domains

    Get PDF
    The ability to detect and characterize bacteria within a biological sample is crucial for the monitoring of infections and epidemics, as well as for the study of human health and its relationship with commensal microorganisms. To this aim, a commonly used technique is the 16S rRNA gene targeted sequencing. PCR-amplified 16S sequences derived from the sample of interest are usually clustered into the so-called Operational Taxonomic Units (OTUs) based on pairwise similarities. Then, representative OTU sequences are compared with reference (human-made) databases to derive their phylogeny and taxonomic classification. Here, we propose a new reference-free approach to define the phylogenetic distance between bacteria based on protein domains, which are the evolving units of proteins. We extract the protein domain profiles of 3368 bacterial genomes and we use an ecological approach to model their Relative Species Abundance distribution. Based on the model parameters, we then derive a new measurement of phylogenetic distance. Finally, we show that such model-based distance is capable of detecting differences between bacteria in cases in which the 16S rRNA-based method fails, providing a possibly complementary approach , which is particularly promising for the analysis of bacterial populations measured by shotgun sequencing

    Greenhouse gas emissions from the grassy outdoor run of organic broilers

    Get PDF
    Nitrous oxide (N<sub>2</sub>O), methane (CH<sub>4</sub>) and carbon dioxide (CO<sub>2</sub>) fluxes over the grassy outdoor run of organically grown broilers were monitored using static chambers over two production batches in contrasted seasons. Measured N<sub>2</sub>O and CH<sub>4</sub> fluxes were extremely variable in time and space for both batches, with fluxes ranging from a small uptake by soil to large emissions peaks, the latter of which always occurred in the chambers located closest to the broiler house. In general, fluxes decreased with increasing distance to the broiler house, demonstrating that the foraging of broilers and the amount of excreted nutrients (carbon, nitrogen) largely control the spatial variability of emissions. Spatial integration by kriging methods was carried out to provide representative fluxes on the outdoor run for each measurement day. Mechanistic relationships between plot-scale estimates and environmental conditions (soil temperature and water content) were calibrated in order to fill gaps between measurement days. Flux integration over the year 2010 showed that around 3 ± 1 kg N<sub>2</sub>O-N ha<sup>−1</sup> were emitted on the outdoor run, equivalent to 0.9% of outdoor N excretion and substantially lower than the IPCC default emission factor of 2%. By contrast, the outdoor run was found to be a net CH<sub>4</sub> sink of about −0.56 kg CH<sub>4</sub>-C ha<sup>−1</sup>, though this sink compensated less than 1.5% (in CO<sub>2</sub> equivalents) of N<sub>2</sub>O emissions. The net greenhouse gas (GHG) budget of the outdoor run is explored, based on measured GHG fluxes and short-term (1.5 yr) variations in soil organic carbon

    Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole- exome sequencing data

    Get PDF
    Background: Detecting somatic mutations in whole exome sequencing data of cancer samples has become a popular approach for profiling cancer development, progression and chemotherapy resistance. Several studies have proposed software packages, filters and parametrizations. However, many research groups reported low concordance among different methods. We aimed to develop a pipeline which detects a wide range of single nucleotide mutations with high validation rates. We combined two standard tools – Genome Analysis Toolkit (GATK) and MuTect – to create the GATK-LODN method. As proof of principle, we applied our pipeline to exome sequencing data of hematological (Acute Myeloid and Acute Lymphoblastic Leukemias) and solid (Gastrointestinal Stromal Tumor and Lung Adenocarcinoma) tumors. We performed experiments on simulated data to test the sensitivity and specificity of our pipeline. Results: The software MuTect presented the highest validation rate (90 %) for mutation detection, but limited number of somatic mutations detected. The GATK detected a high number of mutations but with low specificity. The GATK-LODN increased the performance of the GATK variant detection (from 5 of 14 to 3 of 4 confirmed variants), while preserving mutations not detected by MuTect. However, GATK-LODN filtered more variants in the hematological samples than in the solid tumors. Experiments in simulated data demonstrated that GATK-LODN increased both specificity and sensitivity of GATK results. Conclusion: We presented a pipeline that detects a wide range of somatic single nucleotide variants, with good validation rates, from exome sequencing data of cancer samples. We also showed the advantage of combining standard algorithms to create the GATK-LODN method, that increased specificity and sensitivity of GATK results. This pipeline can be helpful in discovery studies aimed to profile the somatic mutational landscape of cancer genomes

    Outcome Prediction for SARS-CoV-2 Patients Using Machine Learning Modeling of Clinical, Radiological, and Radiomic Features Derived from Chest CT Images

    Get PDF
    Featured Application The present study demonstrates that semi-automatic segmentation enables the identification of regions of interest affected by SARS-CoV-2 infection for the extraction of prognostic features from chest CT scans without suffering from the inter-operator variability typical of segmentation, hence offering a valuable and informative second opinion. Machine Learning methods allow identification of the prognostic features potentially reusable for the early detection and management of other similar diseases. (1) Background: Chest Computed Tomography (CT) has been proposed as a non-invasive method for confirming the diagnosis of SARS-CoV-2 patients using radiomic features (RFs) and baseline clinical data. The performance of Machine Learning (ML) methods using RFs derived from semi-automatically segmented lungs in chest CT images was investigated regarding the ability to predict the mortality of SARS-CoV-2 patients. (2) Methods: A total of 179 RFs extracted from 436 chest CT images of SARS-CoV-2 patients, and 8 clinical and 6 radiological variables, were used to train and evaluate three ML methods (Least Absolute Shrinkage and Selection Operator [LASSO] regularized regression, Random Forest Classifier [RFC], and the Fully connected Neural Network [FcNN]) for their ability to predict mortality using the Area Under the Curve (AUC) of Receiver Operator characteristic (ROC) Curves. These three groups of variables were used separately and together as input for constructing and comparing the final performance of ML models. (3) Results: All the ML models using only RFs achieved an informative level regarding predictive ability, outperforming radiological assessment, without however reaching the performance obtained with ML based on clinical variables. The LASSO regularized regression and the FcNN performed equally, both being superior to the RFC. (4) Conclusions: Radiomic features based on semi-automatically segmented CT images and ML approaches can aid in identifying patients with a high risk of mortality, allowing a fast, objective, and generalizable method for improving prognostic assessment by providing a second expert opinion that outperforms human evaluation
    corecore